Dolby GearWell, today is my last official day working on BBC HD audio. Somehow I don’t think this project will leave me alone just yet, but after a week’s leave, my main focus will be elsewhere. So I thought I’d take the opportunity to talk about something which has consumed a fair bit of my time, but which I haven’t blogged much about: metadata. For the uninitiated, metadata is “data about data”. A photo’s metadata for example might tell you what camera it was taken with, where it was taken, what exposure was used and so on. In the case of BBC HD’s audio, metadata is carried by the Dolby E and Dolby Digital streams we use, and has two main functions: it describes the audio being carried, and it controls the decoders in your homes. One parameter, often called dialnorm (for Dialogue Normalisation), tells your decoder how loud the programme is, so that it can attempt to smoothe out differences between programmes and channels to give you a more consistent loudness. Another set of parameters control what happens when your decoder downmixes the audio, meaning when it produces a stereo mix for your stereo speakers from the surround sound we may be sending. It’s important stuff, so we have to make sure that metadata survives our distribution chain, and sometimes we even have to add metadata to a programme automatically, which can be tricky. Here’s some of the work we’ve done…

We’ve worked on making sure that metadata survives our distribution chain, but that’s all a bit dull, so I won’t bore you. If you want an entertaining metadata story, read Andy’s blog on our trial of Reverse Karaeoke. What I will share is what happens when we have to create metadata in the delivery chain; normally for a surround sound programme the sound engineer will create the metadata that goes with it, ensuring that the metadata matches the programme content well and the effects of the metadata in your receiver aid the artistic intent of the mix rather than disrupting it. But stereo programmes are delivered to us without metadata, so we have to make it up. What values should we choose? And if something goes wrong somewhere, one of the first things to die could be the metadata, so if that happens we need to create new metadata. Again, what values to choose? The two metadata sets I’ve described here are referred to internally as “stereo metadata” and “reversion metadata”.

Stereo should be fairly easy. BBC HD uses Dolby Digital to send its audio, but SD channels use MPEG2 audio, and they have no metadata. So programmes mixed for stereo-only are mixed to a standard level – they all sound much the same volume hopefully. Therefore we don’t need to set a different dialnorm for each programme, all we need to do is choose one value for all stereo. We currently use -23dB based on imperical testing, and a consistency with other broadcasters who use the same value. Then there’s DRC, or Dynamic Range Control. This one’s a bit more tricky to explain, but basically it allows your receiver to reduce the dynamic range of the audio, which is the difference in volume between the quietest and loudest parts. So DRC makes the quiet bits a little louder, and the loud bits a little quieter. The idea is that we can broadcast programmes with a nice big dynamic range so that those with a high-end audio system can get cinematic effects, while allowing the decoder to reduce the range for those of you listening on small speakers in your telly for example, which won’t be able to produce such a range of volumes. So far so good, but stereo programmes are generally mixed for compatibility with stereo-only channels (i.e. all BBC channels except BBC HD), so they have a small dynamic range in the first place – they’re designed to work on all TVs and audio systems without dynamic range control. So recently, we switched from using a small amount of DRC in our stereo metadata to using none at all. This should ensure that stereo programmes sound the same on any channel, and we’re watching the results carefully.

OK, so what about reversion? Well this is trickier. Remember that this is what happens if things go badly wrong – not something we want to happen, but something we must prepare for. We have to come up with a set of metadata which works for all programmes as best we can, causing the least degredation to the biggest range of programmes so that if reversion happens, whatever programme we’re broadcasting will sound OK. So question one is this: do we tell your decoder that we’re sending 5.1 or stereo audio? The answer has to be 5.1 – if the metadata says 5.1 and a stereo programme is sent, your decoder will just reproduce the left and right channels in the left and right speakers. Any fancy Dolby Pro-Logic decoding won’t work, but you will hear the basic audio. If we did the opposite and signalled the programme as 2.0 (stereo), a 5.1 programme would be badly degraded, as you wouldn’t hear the centre and rear channels, which would probably mean you wouldn’t hear the dialogue!

The DRC is the next question, and a relatively easy one – we stick with the default setting, which applies quite a lot of dynamic range processing. This will make sure that any 5.1 programming comes out of your speakers in a way that works for all programming and all speakers, even if it doesn’t sound so impressive on high-end systems. And while stereo programming might be affected a bit, it won’t seriously degrade the audio. The final important question is the dialnorm. Whatever happens, if the dialnorm doesn’t match the programme, you’ll hear the sound either too loud or too quiet. Since not all programmes are at the same level, there is no ‘perfect’ value to choose, we have to simply make a best guess. The choice we made is to use a dialnorm of -23dB, the same as for stereo. What this means is that stereo programmes should sound normal, while surround sound programmes will likely sound too quiet (by a varying amount depending on the programme). Again, we based this decision on the least-worst effect it would have; surround being too quiet is less bad than stereo programmes being too loud (which would have been the other option) as people generally find things jumping up in volume more annoying.

So there you go, that’s metadata for you. We think we have a pretty strong system now, so that all surround and stereo programmes reach you with the best metadata settings possible, and even if things go wrong the results should be pretty good. We’ve also used some tricks with metadata to help us identify the source of a problem if one occurs, so as well as sounding better if things do go wrong, we can fix the problem faster. Some of you may be disappointed to hear it, but I don’t think we’ll be having any more Reverse Karaeoke!

So that’s it from me! If anything particularly exciting happens in the world of BBC HD’s sound, I’ll try to let you know. And as I move up north to help develop a new Research and Development lab I’ll try to tell you a bit about that too, as I think it’ll be an exciting journey. Watch out for a new series of posts from me about that on this website, and for updates on BBC HD and the BBC’s wider technology work, keep reading BBC Internet Blog. Cheers!


I’m an engineer with the BBC and sharing information about my work, but this is my personal website. Because the subject matter here is fairly different to my personal posts, this post is part of a seperate category, with its own RSS feed. You can therefore choose to only read my work-related posts, or to ignore them altogether.

5 Responses to “Metadata: Getting it right, even when it’s wrong.”
  1. “[apply quite a lot of dynamic range processing which] will make sure that any 5.1 programming comes out of your speakers in a way that works for all programming and all speakers, even if it doesn’t sound so impressive on high-end systems”

    So, so disappointing to hear you dismiss such an important topic like this. You are basically saying you’re happy to compromise ultimate quality to the benefit of users who will either not notice the difference, or care at all if they did. The opposite approach, reduced compression would benefit those who even are aware of such things, whilst probably not impacting at all on the majority.

    Let me put it grossly oversimplified terms:

    Approach #1 – Max DRC
    98% population don’t notice/don’t care – happy
    2% population notice reduced dynamic range – unhappy
    Net result: 98% of population happy

    Approach #2 – No DRC
    98% population don’t notice/don’t care – happy
    2% population appreciate full dynamic range – happy
    Net result: 100% of population happy

    Why wouldn’t you want to keep 100% of your audience happy? Especially when the affected 2% are the only ones who actually care deeply about such things? Most of the 98% have their Freeview/Sky boxes connected to their 40″ LCD TVs via composite, leave all the default factory audio processing on (which probably includes some sort of DRC) and watch everything at the wrong aspect ratio anyway – they don’t deserve your consideration! :)

  2. Gordon,

    Thanks for reading, and taking the time to comment. However, I think you’ve missed the point here a little – allow me to explain. The situation I was talking about is a “reversion” situation. This means that the DRC we’ve chosen here will *only* be applied if something goes pretty badly wrong, e.g. if an incoming video feed arrives at us with malformed metadata *and* we don’t realise, (and our standard OB checks include making sure the metadata is correct, so this is very unlikely indeed) or the metadata dies mid-programme and we haven’t yet been able to correct it. To be clear, this situation has never occurred since BBC HD went on air – we’re talking a real last resort here.

    So given the last-resort nature of the situation, we accept some degradation in the audio for high-end users in order to ensure that the programme can continue at all. Bear in mind that if we hit such a bad problem, we probably don’t have a correct dialnorm value either, meaning that the output level of the audio could end up being wrong. So if it’s too loud, (i.e. the dialnorm used is too low for the programme) then users with cheap speakers or listening on their TV could have their speakers damaged if there’s a sudden peak in the audio. DRC aims to prevent this. We’re not talking about reducing the range in a normal situation, we’re talking about damage prevention in a problem situation. If we got this wrong it would be possible to damage users’ speakers – it’s unlikely, but possible. So I think you’ll agree that protecting them is more important than preserving maximum audio fidelity for a minority.

    In a normal programme situation, DRC is there exactly to preserve your ability to hear maximum dynamic range while users with lesser systems get a compressed version. It’s exactly for the 2% you talk about that we allow sound supervisors to set DRC to best match the programme and make it sound as good as possible. And we’ve removed all DRC from stereo programming, as I mentioned in the post. So I think we’ve got it right.

    I hope that clears things up a little.

    Cheers

    Rowan

  3. OK, thanks for the clarification. I guess I’m just used to hearing how often things are pitched with the lowest common denominator in mind. From reading your blog, and others from the BBC technical staff, I’m just glad it isn’t me who has to pick through the minefield of audio processing for digital TV!

  4. Why can’t the Beeb turn stereo channels with ProLogic into DD5.1 streams for us?

  5. Hi

    There are 2 answers to this really. The first is that few (if any) BBC programmes are ProLogic encoded. We don’t use this format generally. The second answer is not one I am qualified to give on behalf of the BBC as I’m not the person who makes production decisions. However my personal take on it is this: BBC HD is a “proper” HD channel, by which I mean that all programming is HD. We obviously allow small amounts of upscaled video as otherwise we couldn’t use archive clips etc, but all programming has to be primarily made of true HD material. While some broadcasters upscale their SD content to make HD simulcasts, we don’t believe that’s a good way to deliver value to the viewer for their license fee. So the same standards would be applied to sound I suspect – ProLogic is not proper 5.1 surround sound, it only has 4 channels and no subwoofer. So to scale this up into 5.1 would not give a proper 5.1 experience in the same way that upscaled SD video doesn’t look half as good as true HD. So that, I think, may be why. If you want ProLogic, you can get decoders pretty cheap now :-)

    Cheers

Leave a Reply

© 2008 Rowan de Pomerai | Admin