Tuesday 10 September 2013

Everything You Always Wanted To Know About Dither (But Were Too Afraid To Ask)

Many digital audio Apps - BitPerfect included - have the ability to apply dither to the audio signal.  Many of those Apps provide a great deal of control over the type of dither employed, and at what stage it is added.  BitPerfect does not.  The reason is that when dither is applied it needs to applied for very good reasons, in circumstances that indicate a requirement for dither, and using an appropriate choice of dithering algorithm.  BitPerfect allows you to choose between two types of dither - an unidentified algorithm provided by CoreAudio, and a BitPerfect-implemented TPDF (Triangular Probability Density Function) dither.  BitPerfect then decides when - and if - this dither should be applied.  It is done this way because experience shows very clearly that users of those Apps which offer in-depth user control over dithering, routinely exercise that control unwisely.

So here is a brief tutorial on dither.  What it is, why we do it, and how it works.

Dither has its roots in dropping bombs during WWII.  Elaborate mechanical contraptions were devised to enable the bombardiers to aim their bombs as accurately as possible when dropping them from bombers while getting shot at.  In the safe confines of the engineering lab, the engineers could not get these devices to work with sufficient accuracy.  But wartime needs being expedient, they were installed into bombers anyway and pressed into service, where, much to the surprise of the engineers, they proved to be far more accurate than expected.  It turned out that the elaborate mechanisms were rather "sticky" in operation, and when installed on a bomber, the immense constant vibration jogged the mechanisms out of their "sticky" positions and caused them to function properly.  Those of you who still like to knock on an old analog meter before taking a reading are doing exactly the same thing.  Back in the lab, the engineers mounted the bombing aids onto a vibrating table, and all of a sudden were able to replicate the excellent in-service performance.  They termed this forced vibration "dither".

Dither is now a well-used art in digital signal processing, in video as well as audio.  I am going to focus on one particular aspect - the most important of the audio applications.  When a signal is digitized (and I will use the term 'quantized' from here on in), it is in effect assigned a very specific value, which, as often as not, is a measure of the amplitude of a voltage.  Because quantization means assigning the magnitude of the voltage to one of a limited number of fixed levels, it is inevitably the case that there is some residual error between the actual value of the voltage and the stored quantized value.  This error is called the quantization error, and what is typically done is to choose the quantization level which is closest to the actual value of the voltage, thereby minimizing the quantization error.  Are you with me so far?

It turns out that minimizing the quantization error is not necessarily the best way to go.  This method produces a quantization error signal that correlates quite well with the original signal.  In plain English, this means that the quantization error signal looks more like distortion than it does noise.  And we know that the human ear is far more tolerant to noise than it is to distortion.  But lets stop to think about this.  Done this way, the quantization error has a magnitude which is always less than one half of the magnitude of the least significant but.  So it will only correlate with the original signal (and therefore produce distortion) if the original signal is sufficiently clean that when looked at with a magnifying glass that sees all the way down to the level of the least significant bit, the signal contains no additional noise.  But if the original signal does contain noise, and if the noise is of a magnitude that swamps the least significant bit, then the quantization error can only correlate with the noise and not with the signal, and the resultant quantization error signal will only comprise noise and no distortion.  Still with me?

So, if the original signal is clean and contains no noise, all we need to do is add some noise of our own, and any distortion components present in the quantization error signal will be replaced by noise components.  Although the magnitude of the noise we need to add turns out to be larger than the magnitude of the original distortion components, this noise turns out to be more pleasing on the ear.  A lot more pleasing, actually.  This added noise is what we call dither.

There are actually many different types of noise.  We want to add the best type of noise for the particular circumstances, and this is where it can get a lot more complicated.  BitPerfect uses TPDF (Triangular Probability Density Function) noise.  This type of noise has been shown mathematically to maximally suppress quantization error distortion with the minimum amount of added noise.  Other types of noise have other properties.  One of the most interesting is "Noise Shaped" noise.  This type of noise is more complicated, and in order to work properly has to be added within a frequency-sensitive feedback loop.  It has the interesting property that the added noise is actively shaped away from one portion of the frequency range (where the ear is most sensitive) and into another (where the ear is less sensitive).  Surprisingly, in the right circumstances, noise shaped dither is capable of suppressing the SNR to a level below the notional theoretical limit imposed by the bit depth (approx 6dB per bit).

It is important to appreciate that dithering the signal adds noise to it - noise that was not there before, and which can never be removed again afterwards.  If you dither an already-dithered music data stream, you will only be adding further noise to the already-added noise which is not normally of much benefit - in fact it will usually degrade the signal.  In particular, adding noise-shaped dither to a data stream that has already received noise-shaped dither can raise the noise to quite unpleasant levels at higher frequencies.  Many CDs are mastered with a final application of noise-shaped dither, so if you have ripped one of these, you don't really want to be applying more noise-shaped dither when it comes to playback.  Unfortunately it takes a suite of Analytical DSP Apps to determine this, and even those of us who do have these Apps generally cannot be bothered with doing it on any sort of routine basis!

I will comment specifically on two scenarios which will be of relevance to BitPerfect users.  Sample Rate Conversion and Digital Volume Control.

In BitPerfect's implementation of SRC, the 16-bit or 24-bit integer data is first transcoded to a 64-bit Float format.  SRC comprises some heavy mathematics operating on the 64-bit Float data, at the end of which we have a bunch of 64-bit Float numbers that need to be converted back to integers again.  64-bit Float numbers are stored with 48 bits of precision, and to convert them back to 16-bit (or 24-bit) integers we have to throw away the least significant 32 bits (or 24 bits, respectively) of data.  The difference between the new 16-bit (or 24-bit) value and the original 48-bit precision becomes the new quantization error.  So it is wise to apply some dither.  Easiest to do would be to choose TPDF, and this would be a good choice.  With a 16-bit output format, there is some potential benefit to be had in applying noise-shaped dither, but you would need to be confident that no further noise-shaped dither is being applied downstream.  With 24-bit data, there is a very fair argument to be made that it does not need any dithering at all, since nothing below the 22nd bit is ever audible anyway.  But dithering 24-bit data can't hurt either way.

With digital volume control, we are usually only talking about digital attenuation.  Some DACs can provide an amount of digital gain, but these are relatively few, and digital gain is not normally of interest to audiophiles, so I will ignore it here.  Digital attenuation effectively reduces the bit depth of the music data.  Every 6dB of attenuation loses you one bit of resolution.  So 24dB of attenuation loses you 4 bits of data.  Those lost bits of data drop off the bottom end, into the digital void below the LSB, and so any dither present in the signal gets lost in the process.  It is therefore advisable to re-dither the signal after performing volume control.  TPDF dither would again be a good choice here, but this could also be an ideal place to introduce an appropriate noise-shaped dither function.

In BitPerfect, we apply dither after all SRC operations, using either TPDF or CoreAudio according to the user selection, but we do not yet dither after volume control.  This is to do with limitations on the way we have written our audio engine, but version 1.1 of BitPerfect will include a completely new audio engine that can perform real-time dithering on the volume control.

So spare a thought for those valiant WWII bombardiers.  All this was furthest from their minds!