ここから本文です

Immersive Sound, Now and Then

2022-04-30

Theme:sound&person, Music in general

■ What is Immersive?

Lately, the word ‘immersive’ has become more commonly heard. While it has occasionally been used in the movie and audio suite industries for some time, it was rarely encountered, and I often had to look up its meaning in an online dictionary. These days, however, not knowing the term feels embarrassing as a professional.

With the release of the PlayStation 5, the world of VR has seen some excitement. NHK’s World Sport x MLB has also experimented with segments like Zeus and Virtual Stadium, though these may be difficult for viewers to grasp. (In VR, even sportscaster Shuko Yamamoto could theoretically hit pitches from Cy Young winner Trevor Bauer or Shohei Ohtani.) Trial play sessions of PS5 and VR experiences sometimes appear on variety shows and YouTube videos, but they often feature talents (including YouTubers) just screaming excitedly, leaving many viewers thinking, “I don’t get it at all.” In such moments, the term ‘immersive’ is often used—though even that is becoming less common lately.

The word ‘immersive’ translates to “providing immersion in something”

It may be a somewhat abstract term, but it might be easier to understand if we think of it as “a sensation of being in another space.” The term is often used interchangeably with spatial audio, though more people may prefer ‘immersive’ nowadays.

■ The Current State of the Immersive Industry

I have experience working with 5.1-channel and 7.1-channel surround sound for films, as well as Dolby Atmos and DTS:X. Unfortunately, in Japan, there are only a limited number of projects produced in Atmos or DTS:X. Occasionally, up-mixing (converting 5.1 audio into Atmos or DTS:X) is done for promotional purposes, and niche productions sometimes create content in DTS Headphone:X. However, immersive audio is far from being a standard practice in Japan’s film and video production industry.

The gaming industry is where immersive techniques are being most actively adopted and are expected to expand further. Personally, I believe immersive experiences are not well-suited for films. Movies and dramas are best enjoyed from a detached perspective, which enhances their storytelling depth. Unlike what VR advertising suggests, films are not something to be ‘experienced’. Immersive techniques might work well for certain horror movies, but movie theaters are not haunted houses, so their effectiveness has limits. While surround sound up to 5.1 channels may be meaningful, every time I watch an Atmos screening, I wonder how effective 3D sound really is on a 2D screen. 3D screenings were a trend for a while, but they didn’t evolve significantly and seem to be declining in number. Just as books remain books, movies will likely remain movies.
There is also the dummy head microphone, but it seems to be used mainly for niche applications, such as ASMR experiences where listeners want whispers close to their ears.

VR technology still has a long way to go. The bulky headsets and the need for adequate physical space are significant limitations. Until these physical constraints are overcome, VR may not see widespread adoption. Remember the Nintendo 3DS? Unless hardware advances to the level of Star Trek’s Holodeck, VR will remain a niche technology for enthusiasts. Since VR is still in its early stages, we can expect significant improvements in content presentation and user experience. Ultimately, technologies that fail to transform our daily lives will struggle to gain mainstream acceptance.

■ The Challenges of Immersive Sound

Within the immersive industry, sound appears to be facing some difficulties. Many 3D audio software solutions are emerging, but no definitive, dominant software has yet been established. Since setups like Dolby Atmos, which require multi-speaker environments, are not feasible for everyone, headphone-based 3D audio mixing, like DTS Headphone:X, is likely to become the norm (if it hasn’t already).

Creating a realistic 3D sound field using just two sound sources (headphones) is an extremely challenging task. Even if headphones contain multiple drivers, the distance to the listener’s ears remains the same, meaning the sound source is still perceived as coming from two points. This is the fundamental challenge of immersive sound. With visual information, we can stitch together 360-degree images from multiple cameras, creating an immersive environment, as seen in technologies like Google Earth. Why can’t the same be done with audio? Let’s explore this further.

■ The Differences Between Vision and Hearing

Understanding the differences between hearing and vision can provide insight into the challenges of immersive audio. The human field of vision spans approximately 200 degrees horizontally and about 125 degrees vertically. Light enters within this range, is processed by the optic nerve, and is recognized as visual information. Even if a person exists in a fully three-dimensional space, their visual perception is limited to this roughly 200-by-125-degree range. This means that as long as a 360-degree video dataset is available, a system can be designed to dynamically adjust the displayed visual field based on head movement or rotation. With specialized goggles, this allows for the creation of a 3D visual world. By completely covering the field of view, the visual experience itself can be virtualized.

Now, how does this compare to sound? Unlike vision, human hearing naturally perceives a full 360-degree environment at all times. For example, in an apartment, if a loud noise occurs, we can determine whether it comes from the neighboring unit, the floor above, or the floor below. This ability is due to the way sound waves (air pressure fluctuations) reach our eardrums at slightly different times and how our ear shape helps us recognize direction and distance.

Building on this, let’s consider how to virtualize sound.

Virtual sound refers to audio that does not physically exist in a location but appears to originate from a specific point in space. The simplest approach to achieving this is to place numerous speakers around a listener and output sound from those locations.

This concept is similar to multi-channel audio, as seen in surround sound or Dolby Atmos, where multiple speakers—including overhead ones—are installed, and specific sounds are assigned to different channels. However, it’s impractical to place speakers exactly where needed at all times, and as the number of channels increases, managing them becomes a burden for content creators. To address this, immersive sound primarily employs an object-based approach. In simple terms, object-based audio records pan (positioning) and volume data, allowing sounds to move freely in space without relying on fixed speaker locations. While the technical aspects of this are crucial for hardware developers and installers, end users and creators don’t necessarily need to concern themselves with these details. The real excitement for listeners is experiencing sound that moves in 3D space, while for creators, the ability to manipulate and position sound freely using panning software is key. In practice, formats like Dolby Atmos and DTS:X use a hybrid approach that combines multi-channel and object-based techniques. Although I haven’t personally used Auro 3D, it also follows a similar principle—essentially placing as many speakers as possible to create a virtual soundscape. This method aligns sound with our natural spatial hearing, tricking the brain into perceiving sounds as originating from specific locations. In that sense, it virtualizes hearing—though, admittedly, not quite as convincingly as visual VR.

In this way, when creating 3D sound by physically placing speakers around the target, it is possible to virtually position sound using existing technology. However, there are conditions for this approach. The target must remain stationary, like in a movie theater. If they move, the carefully adjusted panning will be disrupted, and the sound positioning will no longer be accurately represented. Nowadays, software exists that can adjust panning based on a person’s position relative to the speakers, but if multiple people move freely within a room, speakers alone cannot accommodate this. A speaker can only provide a single sound image at a time. This makes it unsuitable for interactive experiences where people move freely or for multiplayer gaming.

For this reason, immersive sound using headphones, as mentioned earlier, is likely to become the mainstream approach. However, this introduces another challenge. With headphones or earphones, the sound is generated from a unit positioned close to the eardrum. This prevents the human ear from using natural cues such as ear shape and spatial reflections to perceive position and distance. This is a significant issue for human auditory perception. To overcome this, technologies such as DTS Headphone:X have been developed, utilizing psychoacoustics, subtle panning, reverberation, and frequency characteristics to recreate 3D sound within headphones. Currently, the audio industry is actively pursuing advancements in this field.
However, the quality is still far from perfect, particularly when it comes to accurately reproducing vertical positioning. If we were to develop a full-face helmet-style headset with built-in speakers at the top, it might improve the experience. But let’s be honest—no one is going to willingly wear a full-face helmet at home just for audio immersion!

Completely virtualizing auditory perception is still a difficult challenge. This is likely because hearing involves far more relative elements compared to vision.

The technology has come a long way, but I feel that software alone has its limits. Rather than spending time on simulations within software, wouldn’t it be faster to develop immersive-oriented earphones or headphones? Simply placing a sound unit behind the ears could significantly improve the experience—or at least, that’s my guess.

Now, this might turn everything upside down, but while we keep talking about 3D and 360-degree sound, the truth is that audio isn’t actually 360 degrees. That’s because we don’t account for sound coming from below. In the real world, we don’t expect a dog to bark at us from underground or for someone to call out to us from beneath our feet. However, we do hear sounds from below—our own footsteps, the clinking of coins dropped at our feet, and many other sounds. For example, with current technology, we can create visuals that make it look like we’re walking on a sandy beach when we look down (though replicating the physical sensation is still impossible). But can we realistically produce the sound of sand being crushed underfoot from below? That’s much harder. Close-range sounds are also challenging—there’s no way to specify an absolute distance, such as “this sound must originate exactly X meters away.”
In other words, the immersive cockpit we’re sitting in right now isn’t like the fully enclosed panoramic cockpit of a Gundam. It’s more like an old Zaku—flawed, with blind spots. Not sure what I mean? Well, it doesn’t really matter if you don’t get it.

This is an exciting field with huge potential. I plan to keep a close eye on developments and incorporate them into my own creative work.


The “sound & person” column is made up of contributions from you.
For details about contributing, click here.

Taiyo Haze

ギタリスト、サウンドエンジニア(ミキサー、MA、PA)、コンポーザー、WEBデザイン(エンジニア)、詩人、ヘルシー志向、珈琲、喫茶店、読書家、野球好き。 人生のあらゆる時点で自分の興味と好奇心、その時環境が要求する知識、スキルに真摯に取り組んできました。 そして、何より幸運だったのがどの学びの時も、その分野の生き字引のようなメンターの元で経験と研鑽を積めたことです。 特に20代の頃は狂気のように深遠な体験と専心から何物にも揺るがぬ感性を身につける事が出来ました。 何にでも興味は持つけれど、同時に飽きっぽくもあります。3年以上続いてるのは、音楽、野球(今はプレーしてないです)、読書(特にSF)、WEBくらいでしょうか? だいたい3年超えると一生やり続ける気がします。
website https://kaosway.com/
twitter https://twitter.com/kaosway
instagram https://www.instagram.com/kaosway/

 
 
 

Categories

Translated articles

Calendar

2025/4

  • S
  • M
  • T
  • W
  • T
  • F
  • S
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30

Search by Brand

Brand List
FACEBOOK LINE YouTube X Instagram TikTok