Revolutionizing Interactive, Immersive Audio:

Effecting a New Standard for High–Quality Audio

Caleb Madsen

May 1, 2017

Updated July 17, 2018

From the early days of audio recording, capturing audio signals on a recording medium has evolved through waves of invention and commercial introduction of new technologies.  Currently, the audio industry is on the verge of another breakthrough.  Audio researchers are discovering ways to bring naturally immersive experiences into entertainment by studying how sound is interpreted by the brain.  Using new audio encoding techniques, the resulting listener experience is enhanced with better quality audio delivery.  From home settings to commercial theaters and particularly to augmented and virtual reality experiences, the interactive feeling of “being there” is nearing mass–consumer availability.

Understanding what high–quality audio entails and why it’s important to the brain’s perception and interpretation is vital to achieving enjoyable, immersive experiences.  Because sound recording is in its digital age, there are currently multiple audio reproduction formats that are available, including those that keep the original quality but have prohibitive file sizes for delivering high–res audio, and those that degrade the quality of the delivered audio to reduce file size and don’t allow for high sample rates or bit–depth.  Recently, by incorporating neuroscience discoveries and rethinking the fundamental “laws” of digital recording, a new solution to deliver high–res audio in a manageable file size has been invented.

The brain interprets sound traveling through air by the vibration of air molecules.  After a sound is created, the vibrating molecules travel away from the sound source as a spherical wave front.  Human ears pick up on this vibration and translate it into a signal perceived as sound.  One aspect of human hearing is the ability to determine which direction sound is coming from.  This ability to perceive a sound as coming from the direction heard first, before other reflections reach the ears, is called the precedence effect.  As the ears receive sounds, the perceived timing creates a representation of a 3D sonic landscape in the brain. Traditional digital recording and playback methods create an un–natural pre– and post–ring for recorded sounds due to the sounds being processed back and forth through a digital converter.  This impairs the time resolution which results in a “ring” before and after the attack and decay of sounds.  In other words, digital filters cause a ring that blurs the audio, degrading the brain’s ability to fully create a map of the intended sonic landscape.  This problem has been an inherent part of digital audio because of the Nyquist–Shannon sampling criterion that requires steep low–pass filters, causing the blurring, or ringing of transients, in order to avoid aliasing (a type of distortion).  Thus, although higher sample rates help reduce the effects of the filter, it doesn’t solve it.  Master Quality Authenticated (MQA) technology has solved this fundamental problem.

MQA has created a way to remove any blurring caused during the recording process.  MQA provides time–accurate audio and has a process that “folds” the high–res file into a slightly–larger–than CD file in a regular file format.  To reduce blur, recording at a high–resolution, high–sample rate and bit depth (96 kHz, 24–bit or better) is still desired with MQA.  By extensively researching audio from the neuroscience perspective, the patented MQA encoding process can fold the high–res, time–accurate audio information into a file only 20% of the size of traditional delivery formats.  This is a huge achievement for high–quality audio delivery.  Because of this, high–res files can be streamed online and downloaded faster.  MQA is also backwards compatible.  To “unfold” the file to its full, lossless, high–res, mastered quality, bit–for–bit, requires playing the file through a digital to analog audio converter (DAC) that supports MQA.  However, if it’s being played on a DAC without MQA support, it’ll still play at better–than–CD quality — just not the full, high–res encoded quality.

MQA can be used with any lossless or uncompressed format (i.e. FLAC, WAV, etc.).  Lossless means that no information is lost when encoded and that there is only some data compression that doesn’t throw out any audio.  For example, if an engineer has a project and exports it to a .WAV file, the .WAV file would contain all the information that was created in the original project.  However, if the engineer exports it as a lossy file, such as an .mp3, audio information would be lost, degrading the listener’s experience.  Lossy compression formats literally lose audio that is in the original mix to save space — creating a smaller file size.  This loss of information reduces stereo width which makes the mix feel closer to mono, increases distortion, rolls off low and high frequencies, and alters other aspects of the original file.  In turn, this information loss also aggravates the brain.  If compressed files are used, the listener only gets some of the intended sound.  For example, the best .mp3 files only have a bit rate of 320 kbps.  This is much less data than CD quality files, which have a bit rate of 1411 kbps, and even less than the original source files from the studio that created the original high–quality, 24–bit, high sample rate recordings (such as 88.2 kHz, 96 kHz, or higher).

Being able to fold high–res audio into a regular file format that’s just a little bigger than a CD–quality file is revolutionary.  The ability to stream these high–res files over the internet and download the files at faster speeds allows content creators to deliver the intended experience without using massive amounts of storage space.  Additionally, with the crispness of the audio restored by MQA’s process, listeners’ brains will also better interpret the space of the sound field intended by the content creators.  This is an important aspect that shows the need for high–quality audio reproduction to create a deeply immersive experience in a 3D space.

When combined with MQA, object–based audio could take advantage of the new delivery advancements.  Object–based audio is a reproduction format for spatial audio being explored to increase the degree of immersion for audio experiences.  With object–based audio, sounds are represented as individual objects that are encoded with metadata.  Instead of creating many mixes for multiple formats, such as stereo, 5.1 surround, 7.1 surround, and beyond, digital audio objects have descriptions of their representation sent to a renderer.  This provides flexibility to play on various systems as well as the ability to create a highly interactive experience using the encoded metadata.

The metadata can be used to describe a sound’s position in a room, the size of the objects, and more.  Object–based audio reproduction can scale from small systems such as mobile phones to large systems with many speakers and the interactivity with the user creates a greater sense of immersion than before.  Two major audio companies that have been creating object–based systems are Dolby and DTS; Dolby created Atmos and DTS created DTS:X.  Both systems are object–based audio reproduction solutions and have commercial theater and home theater versions, allowing the same mix to be played in a large theater and at home.  These systems can encode 24–bit lossless audio with sample rates up to 96 kHz.  If these products, which already use high–res audio, add MQA support, they will increase the object–based audio immersive experience with the blurring effect resolved.

Another aspect to solve is the limitation of a traditional sweet spot.  Traditionally, when the listener’s head moves out of the sweet spot, it changes the Interaural Level Difference (ILD) and the Interaural Time Difference (ITD) to the listener’s ears.  This changes how the audio’s mix is heard from what was intended for the listener when the listener’s head is equidistant from the loudspeakers (i.e. the sweet spot).  Traditionally, when a listener’s head gets closer to one side, the audio image collapses toward the closer speaker, or speakers in a surround sound environment, creating a lop–sided perceived image.  An object–based audio reproduction system with support for head–tracking can solve this problem by adapting where the listening position is in real–time.  By adjusting in real–time for the current position of the listener’s ears, the interactivity and sense of immersion greatly increases.  This is especially important, since in the average domestic listening environment, chairs move, listeners move, and therefore the sweet spot is not necessarily lined up with the couch, or other furniture where listeners will be sitting.  Without the compensation for head–movement, the compromised spatial image prevents the listener from perceiving the correct location of the audio, limiting immersion.

To control the virtual position of the audio objects relative to the listener, two panning methods can be applied.  The first is Vector Base Amplitude Panning (VBAP) which is used for low frequencies, under 700 Hz.  The second is Vector Base Intensity Panning (VBIP) which is used for frequencies over 700 Hz.  By using the amplitude panning techniques, the loudspeaker signals adjust for the listener’s movement, keeping the perceived location of the sound objects static, giving motion parallax cues to the listener.  This creates better depth and awareness of the space.  “Amplitude panning is the process of sending identical signals with various gains to a few loudspeakers.  The listener receives the signals in phase from different directions and perceives a single image.”[1]  These different modifications of loudspeaker signals deliver the audio at the correct timing and amplitude for the listener in various positions.

The 3D audio scientists at the University of Southampton’s Institute for Sound and Vibration Research have been working on widening the breadth of knowledge about the perception and cognition of humans for audio objects as well as how to deliver spatial audio to home users with different listening setups and environments.  The research team has developed two prototypes, one using 30 speakers in a 1.6–meter frame and the second using 16 speakers in a 1 meter footprint.  The prototypes use a Microsoft Kinect to track the listener’s head–movement in real–time and is quite an impressive experience.  In addition to the techniques above, the prototype uses both constructive and destructive sound cancellation and binaural audio to create a three–dimensional space for the listener at any position in the room.  The sound objects truly sound as if they are coming from many directions, not just from the speakers.  When experiencing the head–tracking capabilities, some of the sounds will feel as if the sound objects are off to the side, making the listener turn to see what made the sound.  The team’s prototype is a great example of object–based audio in action combined with head–tracking and binaural audio.  Marcos Simón, a Research Fellow within the Institute, says “Whilst binaural audio is mostly consumed with headphones, it is also possible to listen to it with loudspeakers.  The first applications of binaural reproduction date back several decades but have needed the appearance of modern digital signal process robustness to become more popular.  …  Now though, it is possible to adapt this sweet spot to the listening position in real time using computer vision systems, making binaural reproduction with loudspeakers an exciting reality.”[2]

With advances in technology, new ways of delivering audio are becoming more accessible and practical.  As more immersive experiences reach a wider audience, the quality expectations for audio will increase.  Users will no longer have the problem of needing their specific setup aimed at a specific listening position but rather, with head–tracking, will be able to hear sounds correctly from any position in a room.  This advancement in audio is also able to deliver more immersive audio for games.  The augmented and virtual reality genre (AR/VR) of video games, utility apps, and educational programs has been waiting for this type of audio solution to gain experiential believability for their products.

Gamers’ interest in being immersed in their games’ environments are driving advancements in the way games are created and delivered.  AR/VR gaming is on the rise with several companies having made a successful entrance into the space in 2016 and AR/VR is expanding into utility uses, education experiences, and more.  Graphics have come a long way and creators are pushing the limits of their graphics technology.  However, graphics only address the visual experience of a gamer.  Audio must rise to the level of immersion needed to support players feeling fully immersed in games.

While a lot of effort has gone into creating impressively immersive, high–quality graphics and storylines, the audio experience has lagged behind.  If a player puts a VR headset on, but his auditory sense can tell that he isn’t really “there” (in the game’s setting), it is not yet a truly immersive experience.  Without the player’s senses in sync, the brain will pick up on the conflicting cues from different senses and the brain will fight being fully immersed.  This keeps AR/VR in an “up–and–coming” category as opposed to having rapid mass–market appeal.

Companies are currently exploring and researching how best to handle audio for AR/VR.  For instance, the gaming company Valve recently acquired Impulsonic whose goal is to bring acoustics and VR together.  Impulsonic’s game engine plugins are physics–based audio tools used to render binaural audio, calculating the player’s location in the game’s environment, affecting the audio in real–time.  Although this is a great advancement for audio in games, without game developers delivering high–quality, high–res audio, the experience will still lack the immersive feeling.

Not only is a better delivery format needed in virtual experiences, such as object–based audio with head–tracking and physics–based audio tools, but the quality of the audio being delivered must be paid attention to as well.  Using MQA’s revolutionary encoding to achieve the full impact of the intended audio experience, the goal to have a quality, immersive experience for the user is much easier to achieve.  The audio industry, gaming industry, virtual utility application creators, virtual education application creators, etc., must all pay attention to the quality of audio the end–user gets to hear.  If both high–quality graphics and high–quality, immersive audio are delivered to the end–user, the audio won’t irritate the brain and the virtual experience will be much more captivating, adding fuel to the interest already shown for virtual games and applications.

Consumer awareness of high–quality audio has been desensitized with widely available, compressed, low–quality formats.  Most internet music streaming services continue to stream low–quality, compressed, lossy formats.  Tidal is one of the few services that has ventured into streaming high–quality music.  After offering subscriptions to CD quality music in their HiFi tier for the last two years, the service began to offer the highest possible resolution (master quality) music in January 2017, using the high–quality files the mastering engineer created in the studio encoded with MQA.  This has proved that high–res audio can be streamed through the internet and that there is a market for it.

Content delivered to home entertainment systems is another space that is rapidly losing barriers to delivering quality audio.  As bandwidth increases to support 4K streaming for video, audio quality can increase with the improvements.  Thankfully, with the advent of Blu–ray disks, there is much more space available — enough to encode high–res audio — as opposed to DVDs.  There are a few different audio formats that can appear on Blu–ray disks — some lossy and some lossless.  However, if the user wants to hear the high–quality audio, he must make sure the player, TV, etc., can all recognize and play the high–quality format.  Unfortunately, unless consumers are audiophiles, they are currently unlikely to pay attention to the audio specifications for each piece of their home setup equipment.  For example, if a consumer has equipment using DTS’ “backwards compatible” format, they can still play audio even if the equipment can’t play its full–quality file.  This will mislead the consumer since the loss in quality won’t be as obvious as pure silence when the lossy version is playing instead.   However, this may change in the future.  Companies, like Sony, are creating 4K Ultra HD Blu–ray players that boast about having Hi–Res Audio capabilities.  With more companies supporting Hi–Res 24–bit 96 kHz+ audio, and 3D audio formats, the public will become more educated about audio quality differences and consumers will start desiring better quality audio.

As companies learn to address all the senses when developing virtual experiences, the results will become more immersive.  Eye catching graphics are continually improving, but high–quality audio must also be delivered.  Although consumer education is currently an obstacle, the technological advancements giving the ability to deliver high–quality audio will foster awareness.  Consumers will naturally catch on as they gravitate toward the better sounding virtual experiences, which will result in greater awareness of room acoustics.

Room acoustics is another important aspect of perfecting the listener’s experience.  If the acoustics of a room or venue prevent the advanced audio system from sounding good, the system is not able to deliver the full intended immersive experience.  Depending on the size of a space and the amount of work, the process of hiring an acoustic service company to diagnose acoustic issues for a space and order acoustic treatment to be installed can be quite involved.  AR acoustic utility applications can help streamline the process.

Many spaces are built in a rectangular shape which creates many acoustic problems such as echoes, reverb issues, bass build–up, etc.  There are three main aspects of acoustics that impact the sound of an audio reproduction system in a room or venue — modes, nodes, and antinodes.  Simply put, a mode is a frequency that will resonate in a room based on the dimensions of the room.  One of the main aspects of modes to be avoided is wide frequency gaps between each mode.  If the modes are equally spaced, they will not be noticeable, but if there are gaps between the modes, they will be quite audible and will need treatment.  Secondly, nodes are the cancelation points in a room where sound waves collide out of phase and cancel each other out, creating null points in the room for that frequency.  Lastly, antinodes are the buildup points in a room where sound waves combine in phase and create peaks of the frequency in different positions in the room.  Nodes and antinodes must be avoided where the listener will be located (i.e. the listening position or multiple listening positions with the head–tracking allowing movement through the room).  These three aspects of acoustics will need to be acoustically treated to let the advanced audio reproduction system shine.

The aim for using a 3D audio reproduction system is to have an impressively immersive audio experience, but if the room or venue prevents any system from sounding good, the great experience won’t happen.  To diagnose the modes, nodes, and antinodes of a room or venue, acoustic service providers can measure the space and calculate the results.  If it’s a complex project, the room or venue can be drawn up as a CAD (Computer Aided Design) file and loaded into advanced acoustic software.  This advanced acoustic software can then run simulations and perform advanced calculations for examining how sound will bounce around the venue, reflect off walls, absorb into different materials, etc.  Once the results are calculated, the service provider can recommend acoustic treatments for the space to the client.  There are many different types of acoustic treatments and each type of treatment will need to be placed in the best possible location to absorb or deflect the problematic frequencies to combat the acoustic issues.  For example, bass traps are large and absorb low frequency energy.  If in the calculated results the left corner needs low frequency absorption, the service provider will recommend that location for the treatment to the customer.  Depending on the level of advanced results the customer wants, the project can become quite complex.

AR utility applications can help improve of this process.  By incorporating new advanced scanning technology, AR acoustic utility apps can rapidly scan, calculate, and diagnose acoustic properties of a room or venue.  The first example of the necessary scanning technology on mobile devices was Google’s Tango technology platform for Android. Tango first launched to consumers in 2016.  Tango incorporated depth–sensing and motion–tracking sensors to provide augmented reality and virtual reality inside–out tracking as well as area learning.  Developing their own solution, Occipital created their Structure Sensor as a cross–platform, external sensor package able to plug into various devices.  As more companies create their own platforms with the necessary sensors, AR acoustic utility applications will be increasingly accessible.  By using these sensors to rapidly scan, calculate, and diagnose acoustic properties of a room or venue, the service provider will be able to more quickly and easily diagnose acoustic problems.  They will also have the ability to augment the space they’re diagnosing to show the customer a view of the modes, nodes, and antinodes throughout the space.  Additionally, these types of applications will allow virtual acoustic treatment to be added to the virtual model of the space which service providers can use to show their customers how the treatments will look and how the treatments will improve the room or venue’s acoustics before even placing an order for the treatments, improving customer satisfaction.

One company that is pursuing this AR acoustic utility revolution is Acoustic Masterminds® Inc.  They have published an AR utility application called AcoustiTools® that shows off some of what’s possible with the new AR technology for audio.  However, their advanced upcoming business–to–business AR acoustic utility application will scan in a venue’s dimensions and construct a model of the venue without the need to import a separately created CAD model.  Then, the app will calculate room modes and other problematic acoustic issues for analysis and treatment solutions.  By using the advanced sensor technology, including motion tracking, area learning, and depth perception, the company’s mission is to make it convenient to diagnose acoustic issues in every environment, bringing excellent, enjoyable acoustic experiences to the world.

Modes, nodes, and antinodes are commonplace in rooms and venues.   With the advancement in sensing technology, mobile devices will be able to rapidly scan in a room or venue allowing acoustic service providers to complete their work much more easily and quickly.  Customers, whether home owners looking to create a high–quality home listening environment, studio owners, theater owners, or others will have more affordable access to acoustic solutions to enhance the achievement of immersive experiences.

The use of emerging, cutting–edge technology will spur the next evolution of audio.  When combined, applications, advanced audio formats, high–res audio, head–tracking, and quality audio delivery will revolutionize the immersive and interactive audio experience from games, to utility applications, education applications, and beyond.  Continued innovation of immersive audio will merge the user’s real–world acoustic environment with the application’s virtual environment.  Melding real and virtual environments in AR will propel this audio revolution, establishing a new standard for high–quality audio everywhere.


"3D Audio." Blue Ripple Sound. Accessed February 23, 2017.

Andrews, Tony. "Bad sound is a vexation to the spirit." YouTube. May 02, 2012.

Cook, Perry R., John Chowning, Brent Gillespie, Daniel J. Levitin, Max Mathews, John Pierce, and Roger Shepard. Music, cognition, and computerized sound: an introduction to psychoacoustics. Cambridge, MA: MIT Press, 1999.

Cousin, Michael, and Filippo Fazi. "Research project: Complex sound field representation." University of Southampton. Accessed February 23, 2017.

Deleflie, Etienne. "Interview with Simon Goodwin of Codemasters on the PS3 game DiRT and Ambisonics." Etienne Deleflie. August 30, 2007.

"DELIVERING THE MQA EXPERIENCE TO YOU." MQA. Accessed February 23, 2017.

Elliott, Ben, and Jon Hughes. "Sonic Horizons of the Mesolithic : using sound to engage wider audiences with Early Holocene research."  World Archaeology 46, no. 3 (August 2014): 305–318. Academic Search Premier, EBSCOhost

Franck, Andreas, and Filippo Fazi. "Research project: Describing sound scenes for next–generation audio." University of Southampton. Accessed February 23, 2017.

Freedman, David H. "IMPATIENT FUTURIST." Discover 32, no. 5 (June 2011): 24–25. Academic Search Premier, EBSCOhost.

Galvez, Marcos F SimÓn, Dylan Menzies, Rusell Mason, and Filippo Maria Fazi. "Object–Based Audio Reproduction using a Listener–Position Adaptive Stereo System." Audio Engineering Society. October 25, 2016.

Goodwin, Simon N. "3D SOUND FOR 3D GAMES – BEYOND 5.1." Codemasters. February 1, 2009.

Goodwin, Simon N. "HOW PLAYERS LISTEN." Codemasters. February 1, 2009.

Harley, Robert. "Master Quality Authenticated (MQA): The View From 30,000 Feet." The Absolute Sound. March 24, 2016.

Harley, Robert. "MQA’s Unexpected Twist." The Absolute Sound. January 12, 2017.

Harley, Robert. "The Move to Make Hi–Res Mainstream." The Absolute Sound. June 13, 2016.

Horsburgh, Andrew J. "A Comparative Analysis of Surround Sound Formats and the Current Practical Applications of Ambisonics." Scribd. 2011.

Horsburgh, Andrew J. "Using a Non–Standard Audio Toolkit to Produce Standard Spatial Audio Mixes." Scribd. August 2012.

"How it works." MQA. Accessed February 23, 2017.

"HOW WE GET MQA INTO YOUR HANDS." MQA. Accessed February 23, 2017.


Jon, 9, and Anthony Mattana. "Audio For Augmented & Virtual Realities." Canadian Musician 38, no. 3 (May 2016): 31. Academic Search Premier, EBSCOhost.

Kindig, Steve. "Intro to high–resolution audio: Music downloads that get back to great sound." Crutchfield. Accessed February 23, 2017.

Klier, Michael "Impulsonic Acquired By Valve." Designing Sound. January 15, 2017.

Lendino, Jamie. "Why Your MP3s Sound Bad: High–Resolution Audio Explained." PCMag. February 01, 2012.,2817,2399710,00.asp.

Menzies, Dylan, and Filippo Fazi. "Research project: Fundamentals of spatial audio reproduction." University of Southampton. Accessed February 23, 2017.

"MQA | Home" MQA. Accessed February 23, 2017.

"MQA FOR ARTISTS AND MANAGERS." MQA. Accessed February 23, 2017.

"MQA FOR ENGINEERS AND PRODUCERS." MQA. Accessed February 23, 2017.

"MQA FOR LABELS." MQA. Accessed February 23, 2017.

"MQA FOR PLAYBACK PROVIDERS." MQA. Accessed February 23, 2017.

"MQA FOR STREAMING AND DOWNLOAD." MQA. Accessed February 23, 2017.

"MQA | The Tech." MQA. Accessed February 23, 2017.

Nurse, Jon. "Sound Virtualiser soundbar: immersive 3D audio." Future Worlds. October 27, 2016.

PEOPLES, GLENN. "BACK FROM THE AUDIO ABYSS." Billboard 123, no. 21 (June 18, 2011): 18–19. Academic Search Premier, EBSCOhost.

"PHONON." Impulsonic. Accessed February 23, 2017.

"Research." S3A Future Spatial Audio. Accessed February 23, 2017.

"Stream 1." S3A Future Spatial Audio. Accessed February 23, 2017.

"Stream 2." S3A Future Spatial Audio. Accessed February 23, 2017.

"Stream 3." S3A Future Spatial Audio. Accessed February 23, 2017.

"Stream 4." S3A Future Spatial Audio. Accessed February 23, 2017.


Tsang, P., K. Cheung, and A. Leung. "Decoding ambisonic signals to irregular quad loudspeaker configuration based on hybrid ANN and modified tabu search." Neural Computing & Applications 20, no. 7 (October 2011): 983–991. Academic Search Premier, EBSCOhost.

Weiss, Todd R. "Samsung Experimental 4D Headphones Aim to Add to the VR Experience." Eweek (March 17, 2016): 9. Academic Search Premier, EBSCOhost.

[1]. Menzies and Fazi, Fundamentals of spatial audio reproduction, University of Southampton,

[2]. Nurse, Sound Virtualiser soundbar: immersive 3D audio, Future Worlds, October 27, 2016,