Sunday 3 November 2013

Sample Rate Matters - II.

In yesterday’s post we found ourselves wondering whether a high-rez recording needs to expand its high frequency limit beyond 20kHz, and whether squeezing a brick-wall filter into the gap between 20kHz and 22.05kHz is really that good of an idea.  Today we will look at what we might be able to do about those things.

First, lets ignore the extension of the audio bandwidth above 20kHz and look at the simple expedient of doubling the sample rate from 44.1kHz to 88.2kHz.  Our Nyquist Frequency will now go up from 22.05kHz to 44.1kHz.  Two things are going to happen, which are quite interesting.  To understand these we must look back at the two brick-wall filters we introduced yesterday, one protecting the A-to-D converter (ADC) from receiving input signals above the Nyquist Frequency, and the other protecting the output of the D-to-A converter (DAC) from generating aliased components of the audio signal at frequencies above the Nyquist Frequency.  They were, to all intent and purpose, identical filters.  In reality, not so, and at double the sample rate it becomes evident that they have slightly different jobs to do.

We start by looking at the filter protecting the input to the ADC.  That filter still has to provide no attenuation at all at 20kHz and below, but now the 96dB attenuation it must provide need only happen at 44.1kHz and above.  That requirement used to be 22.05kHz and above.  The distance between the highest signal frequency and the Nyquist Frequency (the roll-over band) is now over 10 times wider than it was before!  That is a big improvement.  But let’s not get carried away by that - it is still a significant filter, one having a roll-off rate of nearly 100dB per octave.  By comparison, a simple RC filter has a roll-off rate of only 6dB per octave.

Now we’ll look at the filter that removes the aliasing components from the output of the DAC.  Those components are aliases of the signal frequencies that are all below 20kHz.  As described in Part I, those aliases will be generated within the band of frequencies that lies between 68.2kHz and 88.2kHz.  If there is no signal above 20kHz, then there will be no aliasing components below 68.2kHz.  Therefore the requirements for the DAC’s anti-aliasing filter are a tad easier still.  We still need our brick wall filter to be flat below 20kHz, but now it can afford to roll over more slowly, and only needs to reach 96dB at 68.2kHz.

Doubling the sample rate yet again gives us more of the same.  The sample rate is now 176.4kHz and its Nyquist Frequency is 88.2kHz.  The DAC filter does not need to roll off until 156.4kHz!  These filters are significantly more benign.  In fact, you can argue that since the aliasing components will all be above 156.4kHz they will be completely inaudible anyway - and might not in fact even be reproducible by your loudspeakers!  Some DAC designs therefore do away entirely with the anti-aliasing filters when the sample rate is high enough.

You can keep on increasing the sample rate, and make further similar gains.

Obviously, the higher sample rates also give us the option of encoding audio signals with a correspondingly higher bandwidth.  Equally obviously, that advantage comes at the expense of some of the filter gains, which vanish completely once the desired audio frequency bandwidth is extended all the way out to the new Nyquist Frequency.  But even so, by extending the high frequency limit of the audio signal out to 30kHz, little is given up in filter performance, particularly with a sample rate of 176.4kHz.

So far I have only mentioned sample rates which are multiples of 44.1kHz, whereas we know that 96kHz and 192kHz are popular choices also.  From the point of view of the above arguments concerning brick-wall filters, 96kHz vs 88.2kHz (for example) makes no difference whatsoever.  However, there are other factors which come into play when you talk about the 48kHz family of sample rates vs the 44.1kHz family.  These are all related to what we call Sample Rate Conversion (SRC).

If you want to double the sample rate, one simple way to look at it is that you can keep all your original data, and just interpolate one additional data point between each existing data point.  However, if you convert from one sample rate to another which is not a convenient multiple of the first, then very very few - in fact, in some cases none - of the sample points in the original data will coincide with the required sample points for the new data.  Therefore more of the data - and in extreme cases all of the data - has to be interpolated.  Now, don’t get me wrong here.  There is nothing fundamentally wrong with interpolating.  But, without wanting to get overly mathematical, high quality interpolation requires a high quality algorithm, astutely implemented.  It is not too hard to make one of lesser quality, or to take a good one and implement it poorly.

Downconverting - going from a high sample rate to a lower one - is fraught with even more perils.  For example, going from 88.2kHz sample rate to 44.1kHz sounds easy.  We just delete every second data point.  You wouldn’t believe how many people do that, because it is easy.  But by doing so you make a HUGE assumption.  You see, 88.2kHz data has a Nyquist Frequency of 44.1kHz and therefore has the capability to encode signals at any frequency up to 44.1kHz.  However, music with a sample rate of 44.1kHz can only encode signals up to 22.05kHz.  Any signals above this frequency will be irrecoverably aliased down into the audio band.  Therefore, when converting from any sample rate to any lower sample rate, it is necessary to perform brick-wall filtering - this time in the digital domain - to eliminate frequency content above the Nyquist Frequency of the target sample rate.  This makes down-conversion a more challenging task than up-conversion if high quality is paramount.

Time to summarize the salient points regarding sample rates.

1.  Higher sample rates are not fundamentally (i.e mathematically) necessary to encode the best quality sound, but they can ameliorate (or even eliminate) the need for brick-wall filters which can be quite bad for sound quality.

2.  Higher sample rates can encode higher frequencies than lower sample rates.  Emerging studies suggest that human perception may extend to frequencies higher than can be captured by CD’s 44.1kHz sample rate standard.

3.  Chances are that a high sample rate music track was produced by transcoding from an original which may have been at some other sample rate.  There is absolutely no way of knowing what the original was by examining the file, although pointers can be suggestive.

4.  There is no fundamental reason why 96kHz music cannot be as good as 88.2kHz music.  Likewise 192kHz and 176.4kHz.  However, since almost all music is derived from masters using a sample rate which is a multiple of 44.1kHz, if you purchase 24/96 or 24/192 music your hope is that high quality SRC algorithms were used to prepare them.

5.  Try to bear in mind, if your high-res downloads are being offered at 96kHz and 192kHz, it means your music vendor is maybe being run by people who pay more attention to their Marketing department than their Engineering department.  That’s not an infallible rule of thumb, but it’s a reasonable one.  (Incidentally, that is what happened to Blackberry.  It’s why they are close to bankruptcy.)