Timber Blog

3 - Timber Blog - Converting a surround mix to 3D/binaural

As 3D audio developers we receive requests from media/film companies to help them convert their existing surround (5.1 or 7.1) catalogue to 3D audio. Ideally they do not want to remix anything, they are looking for a process that converts the existing surround mix into 3D on the fly, maybe even during playback.
This seems to be the holy grail for many film companies; a cheap and easy trick to have their whole catalogue played back in 3D audio.

Many think it is just a matter of getting five 3D engines, position each 3D engine’s source where the speaker should be and position the listener of all five engines in the sweet spot (the centre of the surround circle). Now just route each surround channel (except the LFE channel) through one of these 3D engines and we’re in business. Sounds very logical, and success guaranteed! If you’ve read our blog #2 you might already understand that it is not that simple. 

What’s the difference between listening to a stereo mix over speakers or over headphones?

Mixing engineers in both music and film in general mix over speakers. In music it is quite common to verify the mix over headphones, but the first focus is always to have a good mix going over the speakers. In film mixing checking the mix on headphones is rather uncommon for the simple reason that most films are mixed in surround, and the surround mix would need to be down-mixed to stereo before you can monitor it on headphones.
In other words, when you want to hear a conventional mix as it was intended, you should use speakers.

What changes when you listen over headphones? A couple of things:
• Low frequencies
One difference between listening to a mix through speakers or headphones is the behaviour of the low frequencies; most headphones have a poor or unrealistic low frequency response. And with low frequencies physics kick in; in a cinema you will not only hear the low frequencies from an explosion, you will also feel them thanks to the subwoofers present in the venue.
At home this doesn’t happen because home subs do not have enough power to create this physical experience.

Channel separation
The biggest difference between speakers and headphones is however the channel separation. When listening to speakers, sounds from the left speaker will also arrive in the right ear, and this is influenced by the position of the listener head. (if the listener turns his head, the stereo image will shift) This means that even if a sound is played back by the left speaker only, our right ear will still hear this left speaker sound.
If the same mix is played over headphones then there is almost 100% channel separation: the same sound panned fully left will now be inaudible for the right ear. In other words, the stereo image becomes more extreme on headphones. Listening to a mix with a sound panned to the extreme left played over speakers works fine, the same mix over headphones sounds awkward and annoying, because our brain instinctively feels that it is unnatural to experience a sound in one ear, and not in the other ear.
(in music mixing it is common knowledge that you can pan an instrument fully to one side, as long as there is an instrument with a similar part playing on the right to avoid this effect)

Room interaction
Another difference is that speakers interact with the room. A sound played by the left speaker only will reflect against the walls, ceiling and floor, and this will diffuse the sound and it’s originating point slightly, and make it become ‘part of the room’. In a way some reverb is added which will reach both ears. On headphones this does not happen, the sound reaches both ears very direct and independent, and no ‘room acoustics’ are added.

The fourth difference is that speakers are placed in front of the listener, so that the sound seem to come from a certain distance, while on headphones everything seems too close, and too much located on the left and right sides. 

How does 3D position work?

3D audio processing is based on two sets of 3D xyz coordinates. x for left to right, y for height and z for front to back.
One set of xyz coordinates represents the position of the sound source in a 3D space. The second set represents the position of the listener.
These two sets of coordinates are related to their position within the room. In our software we can specify and modify the size of the room in all directions and position the listener and source anywhere in the room.

Different software uses different coordinate systems, but let’s say that we call the centre of the room at floor level 0,0,0. (x, y, z)
You might place the listener in the centre, with his ears at 160 cm height. So the listener would have coordinates 0,160, 0.
Now you can position the sound source two meters in front, two meters to the left, at a height of one meter in relation to the listener.
That would be in xyz coordinates in cm: -200 (to the left), 100 (height) , 200 (in front).
Based on these two sets of coordinates and the size of the room our software will calculate the differences between the left and right ear for the sound source. If you use more than one 3D engine in a production (because you have for instance two helicopters flying) you can give each source a unique set of coordinates, but for each 3D engine you would have to put the listener in the same position, and all 3D engines would use the same room size. 

Converting stereo to 3D

Before looking at converting a surround mix, let’s first take a stereo example. We have a dog barking panned half to the left. In other words, the left speaker plays the bark approx. twice as loud as the right speaker. If we would start the 3D production from scratch we would use a mono (see blog #2) dog bark recording , get one 3D engine, and position the sound source somewhat to the left and a couple of meters in front of the listener, who is positioned in the centre.
That’s simple and will work and sound great! The 3D engine would calculate not only the volume differences between the left and right ear, but also phase/delay, frequency differences and early reflections. With a mono input you would get a stereo output with all spatial information added to the dog bark.

If we do not want to start from scratch but use the existing stereo mix, we’d have to do something different:
We would need two 3D engines, position one source where the left speaker is, and the second source of the second engine would be positioned where the right speaker would be. For both engines we would place the listener in the sweet spot.

Since we’re starting with a stereo mix of the dog bark we will feed the two 3D engines each with a slightly different dog bark. The left channel of the stereo mix will be routed to the 3D engine that simulates the left speaker, while the right channel of the stereo mix will be routed to the 3D engine that simulates the right speaker.
The left and right channel of the stereo dog bark recording will surely be different in volume (left louder than right), but might also have differences in phase and frequency content.

The left 3D engine would play the loud left channel dog bark, and the softer, slightly different right dog bark would be played in the right 3D engine. Both engines would mimic frequency, phase etc. of those two positions to position each bark correctly in 3D space and each 3D engine would output the bark in stereo with all spatial information added. One bark positioned where the left speaker sits, and one where right speaker sits.

But wait… Wasn’t one 3D engine placing an object exactly at one point in a 3D space? So if two engines play almost the same sound in two different points in space, what happens then? Do we hear two dogs now? No, we’d hear the sum of both locations. They will sum their differences in phase, frequencies etc for both the left and right channel, and what you hear is the sum of both barks. And because the left and right dog bark differ from each other in volume, frequency an phase, the summing would altogether be a new dog bark.
Does that still give a 3D experience? Well, sort of. But a far cry from what you would have heard if the mix was done from scratch in 3D. 

Converting surround to 3D

Now scale this up to a surround mix in 5.1. Imagine a helicopter right above your head. (well, surround of course doesn’t do ‘above’. It would be more ‘on your head’) Assuming you’re sitting in the centre, then in surround all five speakers would play the helicopter equally loud. Let’s convert this to 3D:
You instantiate five 3D plugins, and position all sources at the ideal speaker locations for a surround setup around the listener who would be in the centre for all five instances.
Now the five surround speaker channels that all contain the helicopter sound are sent to these five 3D engines, and each 3D engine will position their input, the helicopter sound, at the 3D position of one of the five speakers and each 3D engine would output this in stereo. What you will hear the sum of one sound positioned five times in five speakers.

In other words, you hear the sum of five different positions. Or more technical, the sum of 5 hrtfs. That is something different than a helicopter played back through only one hrtf; the one that mimics the timing and frequencies of a helicopter above your head. Again, it will still give some sort of 3D experience, but a far cry from a helicopter properly mixed in 3D.
To say it in short: you’re trying to reposition a sound that was already positioned using a different technique. 


And we haven’t even mentioned movement. Imagine the helicopter starts flying in circles around you. Now, as you can imagine, things will become even more complex. Because in surround, to simulate movement, the volume between the 5 speakers is varied. So the individual levels of the 5 hrtfs mimicing the speakers are varied.
And above that, you miss the so powerful 3D movement effect. If the helicopter was remixed in 3D and the movement was programmed in the 3D process then the 3D processor would apply the right changes in reflections, phase and frequency depending on the position. It would sound different for front and back, up and down, and it would also apply doppler, the change in pitch that happens when a sound moves away or towards you.
Movement in surround is mainly based on volume differences. While movement in 3D is a whole different ballgame.
Because in surround our five speakers do not move, the five 3D engines mimicking the speakers wouldn’t calculate any specific changes in sound character based on movement. They would just play the five helicopter sounds at different volume through the five hurts. 

So, not possible?

Of course, we do have some tricks up our sleeve. If you’re looking for a way to convert existing 5.1 or 7.1 content to 3D, then there are possibilities, and we do have ideas. We can apply some tricks that will improve the 3D effect. But we’ll never promise you the same experience as a remix in 3D. Because no matter what people say, it is not the same thing. We believe that a 3D remix of a film is and always will be a totally different experience than converting the surround mix to 3D.

Hybrid mixes
We do understand that for budget reasons it is not possible to remix most existing surround content in 3D. But there are hybrid options possible, where we apply both techniques: re-routing part of the existing surround mix to 3D as described above while applying some extra processing to increase the 3D effect, but remix key elements like the helicopter or ambience tracks in true 3D.
This will greatly enhance the 3D experience of the listener, while for the film companies it is a matter of exporting the existing mix in surround stems plus some extra stems that contain the key elements that need to be remixed in 3D. (which are obviously not present in the surround stems anymore)

Want to know more?

Feel free to contact me at daniel -at-

Timber 3D 

Next Blog #4: Side chaining and other processing in 3D